7 

^ AD-A118  596  WISCONSIN  UNIV-MADI&ON  MATHEMATICS  RESEARCH  CENTER  F/G  12/1 

THE  SELECTION  OF  SMOOTHNESS  PRIORS  FOR  DISTRIBUTED  LAG  EST IMATI— ETC tU) 
JUN  82  H  AKAlKE  DAAG29-80-C-0041 

UNCLASSIFIED  MRC-TSR-2394  NL 


AD  A118598 


MRC  Technical  Sumnary  Report  #2394 

THE  SELECTION  OF  SMOOTHNESS  PRIORS 
FOR  DISTRIBUTED  LAG  ESTIMATION 

Hirotugu  Akaike 


Mathematics  Research  Center 
University  of  Wisconsin-Madison 
610  Walnut  Street 

Madison,  Wisconsin  53706 

June  1982 

DTIC 

^ELECTE 

(Received  May  18,  1982) 

D- 

o 

cs> 

^AU6  2  6  1982 

F 

<-U 
_ j 

U— 

Us  saseklS^  .alee 

C3 

nppimvvm  101  pooiio  mse 

OistrikstisR  uliaitii 

99 

Sponsored  by 

U.  S.  Any  Research  Office 
t.  0.  Box  12211 

Research  Triangle  Park 

North  Carolina  27709 

62 

08  25 

1  52 

UHIVBRSITY  or  wiscohsi n-madx son 
MATHSMATIC8  RSSBARCH  CSMTSR 


THK  SSLBCTIOH  OF  SM00THH8SS  PRIORS  FOR  DI&TRIBOTBD  ZAO  S8TIKATX0H 

Hirotugu  kktlkt* 


Technical  Sunnary  Report  #2394 
June  1982 


ABSTRACT 

In  the  application  of  Shi  Her' a  smoothness  prior  for  distributed  lag 
estimation  the  main  difficulty  is  the  selection  of  hyperparaaeters  of  the 
prior  distribution.  In  this  paper  the  use  of  a  Maximum  likelihood  procedure 
is  proposed  for  this  purpose  and  its  performance  is  dsmonstrated  by  numerical 
examples. 
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SIGNIFICANCE  AMD  EXPLANATION 


The  distributed  lag  estimator  developed  by  ll.  J.  Shiller  based  on  the 
concept  of  smoothness  prior  is  a  significant  example  that  demonstrates  the 
*  potential  of  Bayesian  approach  in  statistics*  However,  the  practical 

application  of  shiller' a  estimator  has  been  hampered  by  the  difficulty  of 
specifying  the  prior  distribution. 

In  this  paper  an  objective  procedure  for  the  selection  of  the  prior 
distribution  is  proposed  and  its  performance  is  checked  by  both  artificial  and 
real  numerical  examples.  The  result  clearly  demonstrates  the  practical 
utility  of  the  procedure.  It  also  shows  the  danger  inherent  in  an  arbitrary 
subjective  choice  of  the  prior  distribution. 

It  is  expected  that  the  result  reported  in  this  paper  will  contribute  to 
the  development  of  practical  applications  of  Bayesian  statistics. 
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ooaffioianta  to  a^  •  0  for  m  >  M,  which  will  bo  a  raasonablo  assumption  for  practical 
applioation  if  tha  lag  longth  M  is  takon  suffioiontly  largo. 
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•  Shlller  duoMtrttid  toy  numarloal  examploe  the  superiority  of  this  typo  of  estimator 

l 

j  to  tooth  tho  ordinary  loaot  squares  estimator  and  tha  Almon  lag  estimator.  This  is  one  of 

tha  oarlioat  exam ploa  of  successful  Bayoalan  oodaling  of  praotioal  importance. 

Tha  crucial  point  in  applying  Shi  liar1  a  estimator  ia  tha  ohoioo  of  tha  hyperparameter 
X.  Shillar  auggaatad  some  rula  of  thuah  for  tha  choica  of  X.  Power ar,  thia  ia  daaply 
ooncamad  with  tha  baaic  problaa  of  tha  aalaction  of  a  Bayaaian  modal  which  ia  rital  in 
implementing  a  Bayaaian  procedure. 

3.  8BLBCTX0M  Of  A  PBZOB  DZ8TRZBUTZ0W  BY  ZJKBZJPOOO 

Whan  thara  a ra  finite  number  of  Bayaaian  models  dafinad  toy  tha  data  diatributiona 
fk(*|®k)  and  corresponding  prior  diatributiona  *k(*k>  -  1,2,...,K)  tha  poetario r 

probability  of  each  modal  ia  giran  toy 

f<ylk)c. 

p<*fy>  -  -j - —  , 

I  ftylkjc. 
k-1  * 

whara  y  danotaa  tha  observation,  Cy  danotaa  tha  prior  probability  of  tha  k**1  modal 
and  f(y|k)  ia  dafinad  by 

f<yl*>  -  /  **<*•  WV-9*  • 

Tha  above  formula  of  p(k|y)  above  that  it  ia  natural  to  oall  f(y|k)  tha  likelihood  of 
tha  Bayaaian  modal  epecified  toy  fk(*|®k>  and  vk<^k)« 

Recant  work  toy  tha  praaant  author  auggaata  that  whan  thara  ia  no  further  prior 
information  available  and  tha  diatributiona  f (* |kS>  are  wall  separated,  l.e.,  only  one 
modal  attaine  high  likelihood  for  one  particular  obeerratlon,  than  tha  equal  prior 
probability  Cfc  •  1/K  ia  a  natural  choice  that  let  tha  data  apeak  moat  (Akalke,  1*02). 
With  thia  choice  of  tha  prior  probability  dietributlon  tha  marl  mam  likelihood  aalaction 
that  chooeee  tha  Bayaaian  modal  with  maximum  f (y|k)  ia  equivalent  to  tha  aalaction  toy 


maximum  poeterlor  probability.  Thia  ahowa  that  under  oartaln  olrcumatanoaa  tha  aalaction 


of  *  Bayesian  model  by  maximising  the  likelihood  can  ba  a  raaaonabla  proeadura  from  tha 
Bayasian  point  of  view.  In  thla  pa  par  we  pursue  tha  possibility  of  applying  this  Idas  to 
tha  problem  of  salaotion  of  tha  smoothness  prior. 

In  practical  applications  of  tha  smoothness  prior  for  tha  distributed  lag  astiaation 
we  usually  do  not  know  tha  ▼aloe  of  a.  Accordingly  we  have  to  specify  a  prior 
distribution  of  a.  To  asks  tha  resulting  estimator  applicable  to  observations  in 
arbitrary  scale  unit  we  consider  tha  use  of  Jeffrey's  ignorance  prior  For  each 

particular  application  wa  aay  consider  a  proper  prior  distribution  C(u#v)o~1  obtained  by 
restricting  tha  range  of  o  to  a  finite  interval  (u,v)  of  positive  numbers*  However, 
since  tha  integral  of  p<y|a#c)p(a|c,X)o*1  with  respect  to  dado  is  finite,  we  aay  use 
this  integral  as  the  likelihood  of  the  Bayesian  model  specified  by  p(y|a,o)  and 
p(a|o,X)c(u,v)o  *  with  sufficiently  small  u  and  sufficiently  large  v*  Since  only 
ratios  of  the  likelihoods  are  of  interest  we  define  the  likelihood  of  the  Bayesian  nodal, 
specified  by  the  data  distribution  p(y|a,C)  and  the  { Improper)  prior  distribution 
p(a,c|X)  -  p(a|C,X)o“1,  by 

P<y|M  ■  //  p(y|a,o)p(a,o|X)dado  . 

Since  the  hyperparaneter  X  has  a  clearly  defined  technical  meaning  as  the  ratio  of 
the  standard  deviation  of  an  error  tern  to  that  of  a  difference  of  lag  coefficients,  it  is 
not  difficult  to  find  a  reasonable  selection  of  finite  number  of  possible  values  of  X. 

The  nodal  selection  is  realised  by  maximising  the  likelihood  over  this  set  of  X. 

For  our  present  model  we  have 

22*1 

P<y|a,o>p<a,o|*)  -  (-*-5)  2  |Xa*yd|4i 

2*0 

-  •«)]•**►{-  -2j  «(X>]  , 

so  20 

Mhar*  a#  -  U'X  ♦  X2*J*d)_Vy  and  «(X)  -  lyl2  -  *•<*•!  ♦  X2R^Rd)a#.  Tha  likaUhood 
PirIM  of  this  model  is  then  given  by 


)  % 
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p(yl*>  -  j  (y)M|)«(X>  2|*'x  +  *a*4*dlJ*  l*a*J*dfa 


Obviously  tbs  estimator 
that  minimises  ly*  -  X*al2, 


a*  la  glvan  aa  tha  aolution  of  tha  laaat  squares  problem 
where 


Hara  it  holds  that  f(A)  ■  ly*  -  By  successively  applying  tha  Houaaholdar 

transformation  to  [X*y*J  ,  first  to  transform  Xr^  into  uppar  triangular  fora  and  than  to 
transform  tha  whole  matrix  into  uppar  triangular  form,  tha  naoassary  quantitias  for  tha 
likelihood  computation  can  aasily  ba  obtainad  during  tha  prooasa  of  tha  laaat  squares 
computation*  For  tha  purpose  of  tha  comparison  of  models  we  may  ignore  tha  constant  factor 

Jl 

l"1*  3r<N/2)  of  tho  likalihood. 

It  Should  be  noted  hara  that  tha  likalihood  p(y|X)  la  also  a  function  of  d,  tha 
order  of  differencing,  and  N,  tha  lag  length.  Thus  our  search  for  tha  bast  modal  is 
realised  by  maximising  p(y|X)  over  some  finite  number  of  possible  combinations  of  X,  d 
and  No  The  practical  utility  of  this  procedure  will  ba  demonstrated  by  numerical  examples 
in  tha  next  section.  The  feasibility  of  this  type  of  procedure  mas  first  discussed  in 
Akalke  (1980a)  and  its  application  to  seasonal  adjustment  mas  discussed  by  Akaike 
(1980b).  Tha  idea  of  maximising  a  likelihood  with  respect  to  tha  hyperparameter  is 
discussed  in  an  earlier  paper  by  Good  (1965). 


4.  NUMERICAL  IHVRSTIOATION 

From  tha  point  of  view  of  data  analysis  a  Bayesian  modal  is  simply  an  artificial 
construction  that  allows  to  generate  an  output  from  a  given  sat  of  data.  Only  when  we 
oonflrm  that  it  often  produces  results  superior  to  those  obtainad  by  other  conventional 
procedures  we  can  claim  tha  usefulnaas  of  tha  Bayesian  procedure.  In  this  section  we  will 
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discuss  the  practical  utility  of  ths  distributed  lag  estimation  realised  by  the  maximum 
likelihood  selection  of  the  smoothness  prior* 

First  me  discuss  the  application  to  the  artificial  example  discussed  by  Shlller  as  his 
second  example*  In  this  example  the  input  series  was  the  series  of  four  to  six  month 
oomnerical  paper  rate  which  pave  a  typical  nearly  ool linear  matrix  X*  The  output  series 
y  was  generated  by  the  relation  y  ■  Ax  ♦  w  with  w  »  0*05e,  where  e  is  a  vector  of 
standard  normal  random  numbers*  The  lag  coefficients  were  defined  by  a^  -  ♦(<*£  )(m  -  9) ) 
with  N  •  19,  where  +(t)  denotes  the  density  of  the  standard  normal  distribution*  The 
dimension,  or  the  length,  of  y  was  given  by  V  *  40*  One  data  set  generated  by  this 
model  was  used  in  the  subsequent  analysis* 

The  search  for  the  smoothness  prior  was  extended  over  the  models  defined  with 
X  •  5  *  2k  (k  *  *10,(1),  10),  M  ■  -1,(1),  19,  where  M  -  -1  denotes  sero  regression,  and 
d  ■  1,2,3*  Since  we  are  accustomed  to  the  use  of  log  likelihood  ratio  test  we  used 
(-2) log  p(y|X)  as  our  criterion*  By  ignoring  the  additive  constant  the  criterion  is 
given  by 

MZC  -Hi oq  8(X)  *  loqlX’X  *  X2*^*d|  -  lo*|A2*£Rd|  , 

where  X  is  the  dimension  of  the  vector  y,  log  denotes  natural  logarithm  and  ABIC  stands 
for  a  Bayesian  information  criterion  (Akaike,  1980a).  Our  search  for  the  model  was 
realised  by  finding  the  minimum  of  ABIC* 

The  best  choice  in  terms  of  the  criterion  was  given  by  N  •  13,  d  »  2  and 
X  «  5  *  2*J.  The  resulting  estimate  is  shown  in  Table  1  along  with  the  true  values  and 
the  ordinary  least  squares  estimate  for  X  »  19,  denoted  by  IS*  By  Table  1  it  is  obvious 
that  the  present  estimation  procedure  is  producing  significantly  improved  estimate  over  the 
ordinary  least  squares  estimate*  Although  the  present  result  is  not  directly  comparable 
with  8hilleras  result  due  to  the  use  of  different  realisations  of  the  error  term,  the  shape 
of  the  estimated  distributed  lag  coefficients  is  quite  similar  to  some  of  the  best  ones 
given  by  Shi  Her* 

One  might  be  oonoerned  with  the  possibility  of  non-smooth  behavior  of  distributed  lag 
coefficients*  To  check  the  performance  of  our  procedure  under  such  circumstances  we 


TABU  1 


■ 

RE8ULTS 

0 

OF  AM  EXPERIMENT  OP  SHXLUR'S  SECOND  EXAMPLE. 

IS  DENOTE 8  Z4CAST  SQUARES  ESTIMATE* 

1  2  3 

True 

.000 

.000 

•  001 

.004 

t* 

-.011 

-.001 

•00s 

.014 

U 

-.010 

•  021 

-.045 

.037 

a 

5 

6 

7 

8 

Trua 

.054 

.130 

•  242 

•  352 

a* 

•  045 

•  136 

.240 

•  347 

U 

-.074 

*255 

.113 

•462 

a 

10 

11 

12 

13 

Trua 

.352 

.242 

•  130 

•  054 

a* 

.370 

.250 

•  125 

•  060 

IS 

.359 

.329 

•  046 

.072 

a 

15 

16 

17 

IS 

Trua 

.004 

•  001 

•  000 

.000 

a* 

0 

0 

0 

0 

Lf 

-.010 

-.050 

.065 

-.008 

* 
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considered  a  system  defined  by 


1.2  x 


0*6  x  «  +  0.4  x  _  +  w 
n-1  n-2  n 


where  the  input  xR  was  the  same  as  that  in  the  preceding  example  and  thus  N  -  40,  and 
wQ  was  also  normal  with  mean  zero  and  a  -  0.05.  The  range  of  the  search  for  the 
parameters  was  the  same  as  in  the  preceding  example.  The  minimum  of  ABIC  was  attained  at 
d  -  1,  M  -  2  and  X  -  5  x  2  7.  The  resulting  estimate  a*  and  the  estimate  a##,  which 
was  obtained  by  the  parameters  used  for  the  computation  of  a*  of  the  preceding  example , 
are  given  in  Table  2  along  with  the  true  values  and  LS,  the  ordinary  least  squares 
estimate  for  M  *  19. 

The  most  remarkable  finding  with  this  result  is  that  our  procedure  produced  extremely 
good  result.  This  was  made  possible  by  the  correct  determination  of  M  and  the  choice  of 
a  very  small  value  as  X.  Contrary  to  this,  a##  which  was  obtained  by  the  parameters  of 
Table  1  produced  very  poor  result,  even  worse  than  LS.  This  clearly  demonstrates  the 
danger  of  applying  a  Bayesian  model  based  on  an  arbitrary  choice  of  the  prior 
distribution.  It  is  obvious  that  a  proper  procedure  of  adaptation  is  necessary  for  the 
practical  application  of  the  smoothness  prior. 

Having  confirmed  the  performance  of  our  procedure  at  the  two  extreme  situations,  most 
favorable  and  unfavorable,  we  now  turn  to  the  example  of  real  data  handled  by  shiller  as 
his  first  example.  Zn  this  exa^le  Shiller  analyzed  the  response  of  the  Federal  Reserve 
Board  Aaa  new  issue  yield  series  to  the  four  to  six  month  prime  coomercial  paper  rate.  Due 
to  the  unavailability  of  the  Federal  Reserve  Board  Aaa  new  issue  yield  series  we  used  the 
corresponding  series  of  Moody’s  AAA  bond  yield  as  the  output  series.  The  similarity  of  our 
result  to  Shiller* s  confirms  that  the  substitution  did  not  change  the  essential  aspect  of 
the  problem. 

Xn  this  example  it  Is  already  noticed  by  Shiller  that  the  coefficient  aQ  has 
different  characteristic  from  other  coefficients  and  should  be  freed  from  the  rest.  The 
validity  of  this  hypothesis  can  easily  be  checked  by  our  present  approach  by  considering 
models  obtained  by  multiplying  the  first  row  of  Rd  by  a  small  positive  number  6.  For 
the  present  example  ABIC  kept  decreasing  when  6  was  decreased  from  1  to  0.0001  in 
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TABIC  2 


TEST  RESULT  FOR  NON-8MOOTH  DISTRIBUTED  LAG  COEFFICIENTS  • 

•**  DENOTES  THE  ESTIMATE  OBTAINED  BY  THE  PARAMETERS  OF  a*  OF  TABUS  1. 


a 

0 

1 

2 

3 

4 

Trua 

1.2 

-.6 

•  4 

0 

0 

a* 

1.206 

-.601 

.394 

0 

0 

a** 

•  659 

.068 

-.075 

.051 

.090 

L8 

1.193 

-.587 

.390 

-.044 

.045 

a 

5 

6 

7 

8 

9 

Trua 

0 

0 

0 

0 

0 

a* 

0 

0 

0 

0 

0 

a** 

.037 

-.025 

-.012 

.021 

-.010 

LS 

.018 

-.018 

.037 

-.037 

•  026 

a 

10 

11 

12 

13 

14 

Trua 

0 

0 

0 

0 

0 

a* 

0 

0 

0 

0 

0 

a** 

-.000 

.039 

-.000 

-.038 

0 

LS 

-.029 

.030 

-.027 

.059 

-.067 

a 

15 

16 

17 

18 

19 

Trua 

0 

0 

0 

0 

0 

a* 

0 

0 

0 

0 

0 

a** 

0 

0 

0 

0 

0 

LS 

•  028 

-.015 

-.052 

.092 

-.043 
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M veral  attpa*  This  confirmed  ths  validity  of  Shlller 's  argument.  Ths  following  rssult 
was  obtained  with  6  -  0.0001  and  ths  mm  rang*  of  tha  parameters  as  in  ths  prscsding 

examples. 

Ths  minimum  of  ABIC  was  attainsd  at  d  -  3#  M  -  19  and  X  »  5  *  25.  Ths  amount  of 

rsduotion  of  ths  minimum  ABIC  obtainsd  by  rsducing  3  from  1  to  0.001  was  20.7.  Ths 

sstimats  a«  is  givsn  in  Tab  Is  3  along  with  ths  sstiaats  a^*°  obtainsd  by  putting 
4  ■  1.0.  In  this  sxampla  ths  constant  tarn  a.^  was  includsd  to  rsprsssnt  ths  sffsot  of 
non-zero  avsrags  valuss  of  ths  input  and  output  ssriss.  Ths  sstimats  dsnotsd  by  a*  was 
obtainsd  from  ths  ssriss  of  first  ordar  diffsrsncss  of  ths  input  and  output  ssriss.  This 
was  to  chock  ths  possibls  offset  of  ths  trsnds  of  both  ssriss.  Ths  sstimats  a*  was 
obtainsd  with  d  -  3f  M  -  19  and  X  -  5  *  2*.  Ths  corrasponding  ordinary  laast  squarss 

sstimats  for  ths  diffsrsncsd  ssriss  is  givsn  by  LS. 

Ths  similarity  bstwssn  a*  and  a*  is  rsaarkabls  and  confirms  that  ths  smooth 
bshavior  of  a*  doss  not  rsprsssnt  ths  spurious  rssponss  dus  to  ths  trand  componsnts. 

Also  ths  distortion  caussd  by  ths  inclusion  of  aQ  can  ba  soon  clearly  from  a**®.  Ths 
usual  srratic  pattsrn  of  ths  least  squares  estimate  persists  In  this  example . 

Sumsarising  our  observations  of  numerical  results.  Including  those  not  reported  hers, 
ws  may  conclude  as  follows * 

1)  ths  estimator  is  most  sensitive  to  ths  choice  of  X, 

2)  ths  choice  of  M,  ths  lag  length,  is  also  fairly  critical, 

3)  ths  choice  of  d,  the  order  of  differencing,  is  not  so  critical. 

Ws  also  found  that  ths  selection  of  M  must  be  dons  with  ABIC  minimized  with  respect  to 
X.  Without  ths  adjustment  of  X  the  selection  of  M  by  minimizing  ABIC  produced  poor 


results 


5*  DISCUSSION 


The  purpose  of  the  present  paper  has  been  to  show  the  feasibility  of  an  objectively 
defined  procedure  for  the  selection  of  the  smoothness  prior  for  the  estimation  of 
distributed  lag  coefficients.  It  was  confirmed  that  the  maximum  likelihood  procedure 
proposed  in  this  paper  can  poduce  results  cmparable  to  the  results  reported  by  Shiller  in 
lu.s  original  paper.  Since  Shiller' s  results  may  be  considered  as  typical  examples  produced 
by  an  expert  this  shows  that  the  present  procedure  is  producing  a  good  approximation  to  the 
judgement  procedure  of  an  expert.  The  next  step  of  the  Bayesian  modeling  will  be  the 
specification  of  a  prior  distribution  for  the  lag  length  H. 

The  result  of  Table  2  demonstrated  the  robustness  of  the  present  estimation 
procedure.  However ,  it  also  demonstrated  the  danger  Inherent  in  the  purely  subjective 
selection  of  a  prior  distribution. 

Although  the  result  reported  in  this  paper  has  demonstrated  the  feasibility  of  the 
smoothness  prior  selection  for  distributed  lag  estimation,  the  practical  applicability  of 
the  single  input  single  output  model  to  the  analysis  of  economic  data  is  rather  limited. 
This  is  due  to  the  common  existence  of  feedback  between  the  input  and  output  in  econometric 
applications.  In  such  a  case  multivariate  time  series  modeling  is  required.  Whether  the 
smoothness  prior  can  find  a  useful  application  in  this  case  is  a  subject  of  further  study. 
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